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ABSTRACT 


Guibas and Wyatt have presented a scheme which permits the space efficient implemen- 
tation of a number of APL operators. The functions involved, the so-called grid selectors, 
are those that operate on the coordinate accessing of their arguments rather than on the 
argument elements themselves. Examples of these functions include take, drop, transpose 
and reversal. Using this method, adjacent grid selectors are merged together to form a sin- 
gle function, called a universal selector. 


In this paper we extend the Guibas method to incorporate dyadic scalar operators that 
occur in conjunction with grid selectors. We also increase the versatility of the method by 
showing how many of the restrictions of the original paper can be eliminated. In particular, 
we show how to generate code for grid selectors that are not accessed in sequential ravel 
order. These methods have been successfully implemented as part of an APL compiler. 
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1. Introduction 


We begin with a brief discussion of the ideas presented by Guibas and Wyatt [2]. Those familiar with the 
Guibas paper will notice subtle alterations in the practical application; however, the theory remains identical. 
These alterations originated through the implementation of the method in an APL compiler [1] different from 
that utilized by Guibas and Wyatt. More than anything this shows the practicality of the theory. 


We extend the Guibas method in-two main directions. First, we explore the interaction of dyadic scalar 
operators with the grid selectors. It was proposed in the Guibas paper that the grid selectors could be distri- 
buted down through the expression tree, past dyadic scalar operators, to act upon the leaves of the tree. 
Several results arise from the discussion of this combination in cases where scalar extension occurs. Second, 
we show how the Guibas method can be extended to situations involving non-ravel accessing of the mergable 
sections of the expression tree, 


2. Selector Merging 


The basic idea of merging the grid selectors is similar to that of the composition of linear transformations. 
The result of applying each linear transformation individually is identical to composing the linear transforma- 
tions into a single linear transformation and applying this to the operand. With linear transformations, the 
common medium for the composition is the matrix and matrix multiplication. In order to merge the grid 
selectors we will use a structure called a selector, of which the stepper is the integral part. Strictly speaking the 
stepper is used to construct accessors, the accessors performing the actual coordinate transformations. 


Consider a portion of an expression tree consisting of adjacent grid selectors (Figure 1). Throughout this 
discussion this portion will be referred to as the tree fragment and the result of the tree fragment will be the 
result at the top of the tree fragment. 


tree fragment 


Figure 1: Tree Fragment. 


2.1 The Selector 


The selector at a node in an expression tree is characterized by four arrays: q, s, d, l, each of size n, where n 
is the rank of the result of a particular node. Let r represent the rank of the result of the tree fragment. The 
arrays q, s, d, and | are defined as follows: 


4; the dimension of the result which the it* 
dimension of the current node corresponds to. 


Ss: the index along the i'* coordinate of the 
current node which contributes to the initial 
element of the result in ravel order. 


d. the direction to move along the it" dimension 
in order to arrive at the next element of the 
result along the q:th coordinate. | indicates 
forward, —1 backward. 


I the shape of the result. (Note that | is only 
defined for i in [1,r].) 


For each node, the stepper of that node’s son is altered as below. Let primed quantities indicate the new 
values, let c; denote the control argument, and let the shape at the node be represented by the array t, defined 
on[I,n]. We define the expansion vector, n, at a node by the recursive definition: n, =1, 7; =ti41 * n41- 


At the root of the tree fragment, the stepper is initialized as follows: q; i; s; -0; d;-1; 1; ~t;. 


Monadic Transpose 
for i in [I,n] 

Qi = An-itt 

Si Spit 

qi, dy i+ 


Dyadic Transpose - p denotes the rank of the son 
for i in [1,p] 


Take 
for i in [1,n] 
qi - % 
ifc; <0 thens’; —s, +t; +G 
else s’; — 5; 
ad, -d 


Drop 
for i in [1,n] 
qi -% 
ifc; 20 thens’; —s; +G 
else si; — s; 
a, - 4, 


Reversal along k'" coordinate 
for i in [1,n] 


qi ~G 

ifi=k thens; —t,-s,+1 woe 
else s} — 5; 

ifi=k then d’) ——d, 
else d’; —d, 


Consider the following expression: 
112Q3451tQOA 


where A is initialized with the value 2 2 2 p “1 + 1 8. The Guibas method shows how the four grid selectors 
in this expression may be composed into a single universal selector. 


Figure 2: Expression tree for 112 @-3451QQA. 


The table below displays the effects of each successive node on the stepper as control is passed down the 
tree fragment. The stepper is initialized at the root of the tree fragment and the result after each operator is 
shown for each of the four arrays. 
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2.2 The Accessor 


After the selector has propagated through the expression tree, the accessor is constructed from the infor- 
mation contained in the final stepper. The accessor is described by two arrays, y and 6, defined on[1,max q; ], 
and two scalars, a and 7. zis the current index into the ravel order of the leaf under consideration, o is the 
initial index. -y; denotes the amount 7 must be incremented to obtain the next element of the result from the 
leaf. 4 also denotes the amount needed to reach the next element, but now assuming that the accessor has 
‘stepped off’ the end along the i'" dimension. 


Let n denote the expansion vector of the leaf. The accessor is constructed as follows: 
for i in [1,max q;] 
v +4 +H 
24 


& yu MH * bw 


and 
n 
a- Ss * 7 
i=l 
So for the above example we have: 
n = [421] y= B=) 
a =3 6 = [23-4] 


The accessor is used in the following way. The node at the top of the tree fragment is accessed sequentially, 
in ravel order. The action of the accessor will mutate that sequential access into an offset access; that is, when 
the operator at the top is given a request for its next value, the accessor will pass down a request for the leafs 
z value. The accessor performs this function by utilizing the 6 values and the loop structure shown below. 
Let /imit be the shape of the top result. count describes the coordinates of the result currently being produced, 
and is initialized to[0 0 ... 0]. 


i+ rank-l; 
while i >0 
womts 
count; — count; + 1 
if count; < limit; 
for i-i+1 to rank—I by | 
count; -0 
break 
else 
i-i-I 


This code is known as the universal looper, and is executed after the desired result has been obtained. In 
essence, the universal looper positions 7 at the next value to be evaluated. 


3. Interaction of the Dyadic Scalar Operators 

As mentioned in the introduction, it has been proposed that the grid selectors can be absorbed and distri- 
buted down through scalar operators, as shown in figure 3. Except in instances where scalar extension occurs, 
the scalar operators do not affect the coordinates of their respective nodes; they only alter the values. And 
since, for dyadic operators, the respective values collected from both sides must have the same coordinates, 
the offsets therefore identical, then the obvious approach would be to pass a copy of the current stepper down 
both paths of descendants. This introduces the notion of selector splitting, which will be discussed shortly. 
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Figure 3: Commutation of the grid selectors past the 
dyadic scalar operators. 


Scalar extension, however, may alter the coordinates of the node being accessed and it is therefore neces- 
sary to examine the manner and effects of this alteration. Scalar extension occurs where one, but not both, of 
the operands to a dyadic scalar operator has rank zero. In these cases, the rank and shape of the scalar object 
is extended to conform to that of the other, multidimensional, operand. 


The effects of the grid selectors upon scalar operands vary. Transpose, both monadic and dyadic, and 
reversal are basically identity transformations with respect to scalar arguments. This is an important quality, 
necessary for the commutativity of these grid selectors with the dyadic scalar operators. 


The interaction of these three grid selectors with a dyadic scalar operation in which scalar extension occurs 
can be readily seen through an example. Consider Q 3 + QA, where A is not limited in rank. If the tran- 
spose is distributed down both paths, then the resulting expression is (Q 3) + (© A); If A is nonscalar, 
then the extension occurs after the grid selectors have been applied to the leaves. More precisely, the tran- 
spose preserves the triviality of the left operand and hence, preserves the effects of the scalar extension. If A 
were scalar, then no scalar extension occurs and hence the desired result is obtained. 


Drop, on the other hand, is not even defined for scalar operands. This alone puts limitations on the ability 
of grid selectors to propagate past dyadic scalar operators. Take has an interesting association with scalars 
and scalar extension. Consider the take of a scalar, for instance 3 t 3. The result is 3 0 0. 3 3 | 3createsa 
3 x 3 matrix with a 3 in the upper left corner and 0’s elsewhere. Thus, take does not maintain the triviality of 
its scalar operands. This special extension prevents take from commuting with dyadic scalar operators which 
have one, but not two, scalar operands. Take can commute in the case of both operands being scalar, since the 
result of the scalar operation is still a scalar, which is then extended according to take’s unique version of 
scalar extension. 


The scalar extension of dyadic scalar operators and that of take are mutually exclusive, in that the applica- 
tion of one negates the need for application of the other. This bit of information will prove useful in the con- 
text of determining at compile time whether grid selectors can be propagated past a dyadic scalar operator. 


3.1 Implementation Techniques 


A compiler’s ability to distribute the grid selectors past the dyadic scalar operators is a function of how 
much rank information is known at compile time. If it is known that neither operand is scalar or that there is 
one or more takes beneath an operand whose rank is unknown, then the grid selectors can be commuted with 
the dyadic scalar operator. However, if it is known that the result of either operand before application of the 
scalar operator is a scalar or the rank is unknown and there are no takes down the path under consideration, 
then the grid selector merging process must stop at that point. 


The value of knowing that a take is present is now apparent. The result of a take is never a scalar, so even 
if we do not know the rank of a subnode at compile time, since there is a take down that path, the result of that 
side cannot be scalar. If there were no take, the the rank may very well be zero and so we cannot propagate 
the selector past that point. 


In cases where the commutation can occur, there appear to be two contrasting methods for commuting the 
grid selectors with the scalar operators. First, the compiler might make actual changes in the expression tree, 
that is, physically alter the structure of the expression tree to correspond to the commutativity of the nodes, 
This method, while clearly representing the essence of the commutation, has several drawbacks. Obviously 
the number of nodes in the expression tree must be increased. This in itself increases the amount of time 
needed to compile the code and more than likely the amount of actual code produced. In addition to this, 
selector information is not shared. For instance 


(PO22IA+ OB) 


is mutated into 


(PO221A)+(PQ22I OB). 


Three extra nodes, and the code for @ © 2 2 | is reproduced twice and thus will be executed twice. In the 
APL compiler the above expression was used to determine representative time/space measurements in com- 
parison to the method discussed below. The above method produced 20% more code and used 10% more time 
during execution. Clearly these measurements are highly dependent upon the expression tested. 


3.2 Selector Splitting 

Another method is to merge the dyadic scalar operators with the rest of the grid selectors. This method 
avoids the drawbacks of the first, while still accomplishing the goal in mind. This method, known as selector 
splitting, treats the dyadic scalar operators as if they were also grid selectors, though actually their function, 
with respect to the merging process, is solely to act as a two-way branch so that previously calculated selector 
information can be passed down both paths. 


The implementation of the dyadic scalar operators as a grid selector is simple. Recall that the selector at a 
given point in the tree fragment contains the information to construct an accessor. This accessor has the abil- 
ity to produce an offset into the remainder of the expression tree beneath it. If both children of the node 
representing the dyadic scalar operator are grid selectors, then at the point that the stepper being propagated 
through the tree fragment reaches the dyadic scalar operator, the stepper is duplicated and a copy is passed to 
each subnode. If, however, one child is not a grid selector or, for what ever reason, the selector cannot pro- 
pagate down the subpath rooted at that child, then the stepper is passed down the mergable subpath and an 
accessor is constructed for the non-mergable subnode. 


4. Accessing Modes 


The concept of accessing modes was briefly mentioned in the last section. Basically, there are three access- 
ing modes: ravel, index, and vector. When a node in the expression tree is accessed in a certain mode, it uses 
the information available to produce the value of the desired element. If accessed in ravel mode, the node 
delivers its values sequentially in ravel order. If the node is accessed in index mode, then it receives an offset 
and must produce the value of the element indexed by that offset. When a node is accessed in vector mode, it 
is given a vector of coordinates, called a request vector, and returns the value corresponding to those coordi- 
nates. 


Other access modes can well be imagined; for instance, a vector consisting of offsets for each dimension of 
the result, the new element's location found by adding this vector to the coordinates for the last one produced; 
ora similar concept for index accessing. However, these are simply variations of the three basic modes and 
shall not be discussed further. 


Recall that the original theory of Guibas utilized a ravel-index mode combination. More explicitly, the 
combined code is accessed sequentially; the leaves extending from the tree fragment, however, are accessed 
with an offset. Consider the expression A @ © 2 2 1 B. Rotation is not a grid selector. Moreover, the per- 
mutation of coordinates by the rotation prevents the accessing of the grid selector fragment (composed of a 
transpose and a take) in ravel order. Yet the fragment is still capable of being composed into a single selector 
and the accessing mode should not hinder this optimization. In this section, the application of the Guibas 
theory to other access combinations, namely index-index and vector-index is discussed. 


4.1 The Accessor and Universal Loop 


At this point, it is necessary to examine the nature of a few of the tools used in the Guibas method of grid 
selector composition. The definition of an expansion vector was given in section I.1. For an expansion vector 
7 of A, the component »; denotes the number of elements in the structure obtained by fixing all coordinate 
positions left of i inclusive. If A were accessed in ravel order, 7 elements of A would be produced for each 
time the i" coordinate changes. If it were desired to produce the next element along the i" dimension, it would 
be necessary to cycle through all higher dimensions to arrive at the proper position. However, this can be 
simulated by simply adding 7; to the current offset. 


Consider the construction of the y vector. The i” element of is the sum over all dimensions of the leaf 
which contribute to the it* dimension of the result, of the direction along those dimensions of the leaf times the 
amount necessary to arrive at the next element along those dimensions of the leaf. For example, if it were 
desired to retrieve the next element along the k'" dimension and q; =k, then the next element would be that 
obtained by incrementing the offset by n;. If d; =—1, the offset would be incremented by —n;, since that 
dimension is being traversed backwards. If both qj; =k and q; =k, then the offset should be incremented by 


i : ij 
nj +n; , Since both dimensions need to be traversed concurrently. 


The + vector acts similar to an expansion vector, in that -y, denotes the amount needed to increment the 
current offset into the leaf, in order to obtain the next element of the resulting structure along the i dimen- 
sion. 

The purpose of the count vector in the universal loop is now clear. At each entry into the universal loop, 
count represents the coordinates of the element just calculated. Since access is sequential, the next element 
will be count’s lexicographical successor. The current offset must then be adjusted to reflect the change in 
count. The universal loop uses 6 to accomplish this alteration; however, another method may be easier to 
understand. Let count’ denote count’s lexicographical successor, then count’—count represents the change 
along each dimension. Thus, the sum over all the dimensions of y; * (count’—count) is the exact change in 
the current offset to reflect the change in count. In the example of the first section, recall that y = [3 -4], 
6 = [23 -4] and the resulting shape was [3 5]. 

Suppose that count =[2 2], count’ =[2 3], then count’—count =[0 1] and the net change =—4. Suppose 
now that count =[2 5], count’ =[3 1], then count’—count =[1 —4] and the net change = 19. It is no coin- 
cidence that 19 = 23 + —4. Indeed, the 6 vector is constructed with this effect in mind. Instead of determining 
the successor of count and then the appropriate addend to the current offset, the new offset and the lexico- 
graphical successor are determined concurrently. 


4.2 Vector Accessing 

With the previous discussion in mind, we now present a method for applying the theory of grid selector 
composition to convert vector access to index access. Accessing modes are not purely one type or another; the 
selector composition code, while basically ravel-index, has a heavy tinge of vector-index accessing already. It 
is evident from the remarks in the preceding paragraphs that count can be thought of as the vector whose 
coordinates represent the value to be produced and the sequential ordering is performed by determining the 
lexicographical successor each time through the universal loop. 


Count is known only in the framework of the merged operators, it is not passed to the top grid selector as a 
request vector. But, its use makes the transition to vector-index accessing straightforward. Suppose that 
count is passed as a request vector, defining the coordinates of the desired element. The universal loop has no 
notion of a last element accessed; however, the very nature of coordinates assumes an origin, and thus the vec- 
tor specifies steps along each dimension from the origin. For each of the dimensions, the y vector indicates 
the appropriate adjustments to be made in the offset. The initial offset, which corresponds to the vector 
[0 0 ... 0], is calculated when the accessors are constructed, namely @. For a vector rv, the corresponding 
index into the leaf is the initial offset, a, plus the sum over the dimensions of rv; * y;. Since the relation 
between two successive vectors is unknown, the universal loop is not needed to determine the lexicographical 
successor of count; Hence, the code for the universal loop is no longer necessary and can be omitted entirely. 


4.3 Index Accessing 


The solution for index access mode is not as easily derived from the existent algorithm. Ravel- and 
vector-index modes were united by the concept of stepping along one or more dimensions, for which the asso- 
ciated adjustments in the offset to the leaf were known. However, when given an offset, there is no informa- 
tion immediately available with regards to the offset’s position relative to the origin. 


Two solutions present themselves: First, the offset could be converted into a request vector, which could 
then be transformed into an offset by the vector accessing method. Second, if n is the offset, then the element 
accessed by that offset is the n"* sequential element that would be accessed in ravel mode. In this case, the 
universal loop could be simulated n times to arrive at the appropriate index into the leaf. 


The method for converting a vector into an index has already been mentioned. It would seem reasonable 
that converting an index into a vector would not be difficult. For an offset n, the value of its associated vector 
v is dependent upon the shape of the object. The element v, =(n mod 7,_;) / 7;. The modular division in 
effect ignores the lower dimensions, and the division selects the number of steps along the i" dimension. For 
example, if an object A has shape [3 4 5], then 7=[20 5 1]. The element addressed by the offset 37 would be 
[1 3 2]. Another way to think of this is to solve the system of equations: 


20x + Sy+ z =37 
x <3 
y <4 

z <5 


and x, y, z are greater than or equal to zero. Obviously, only one solution exists. Since only the individual 
components are needed, the entire vector need not be calculated, which would necessitate two loops. Instead, 
consider the algorithm below: 


offset — a 

for i — 0 to rank—I by 1 
offset — offset + ( index div n;)* y; 
index — index mod 7; 


This algorithm determines each successive value of v, updates the offset accordingly, and uses only a single 
loop. index is the given index, offset is the new index and 7 is the expansion vector for the resulting node of 
the tree fragment. 

An alternate method also uses a single loop. But avoids the extra space and time used in constructing the 
expansion vector of the top node. The algorithm for this method simulates executing the universal loop n 
times, where n is the value of the index. The central idea behind this method is the structure of the 6 vector. 
Recall that the the sum of the elements of dimension greater than or equal to i determines the adjustment 
necessary to advance to the next element after cycling through all dimensions higher than i and that 6, is 
added in each time the step is along a dimension less than or equal to i. So it is necessary to determine how 
many times each of 6's components will be added to the new offset. From the construction of the universal 
loop it is clear that each 6, will be added in n div n; times. This idea is reflected in the following algorithm: 


offset — & 

for i — rank—I to 0 by 1 
offset — offset + index * 6 
index — index div |; 


Notice that by looping backwards through the dimensions the calculation of the expansion vector becomes 
unnecessary. This shortcut was not possible in the first case without further calculations. 


5. Conclusion 


The language APL presents many novel problems for a compiler writer: weak variable typing, run time 
changes in variable shape, and a host of primitive operations. In response to these difficulties there are many 
directions that could be taken. The simplest scheme would be to generate code that merely simulates a naive 
interpreter, i.e. performs each operation in its entirity before progressing to the next operation. This has the 
disadvantage of generating code with a high ratio of control to computation code, and uses an inordinately 
large amount of intermediate storage. 


It is the task of the compiler writer to try to discover ways of generating code which combines many APL 
operations together in order to avoid the construction of these conceivably large temporaries. Notice that in 
doing so there may be definite advantages to a compiler. An interpretor must, by necessity, coexist with the 
programs it executes, and thus is forced to be as small as possible. Since a compiler is not present when code is 
executed, it can afford to use algorithms and analysis of the program that could not be considered in an inter- 
preter. The savings may be substantial. Timings performed on an APL compiler using the technique 
described in the paper show the resulting code is better than many APL interpreters, often by a significant 
amount. 
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